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ABSTRACT 



A multi-layer switch search engine architecture is provided. 
According to one aspect of the present invention, a switch 
fabric includes a search engine, and a packet header pro- 
cessing unit The search engine may be coupled to a for- 
warding database memory and one or more input ports. The 
search engine is configured to schedule and perform 
accesses to the forwarding database memory and to transfer 
forwarding decisions to the one or more input ports. The 
header processing unit is coupled to the search engine and 
includes an arbitrated interface for coupling to the one or 
more input ports. The header processing unit is configured to 
receive a packet header from one or more of the input ports 
and is further configured to construct a search key for 
accessing the forwarding database memory based upon a 
predetermined portion of the packet header. The predeter- 
mined portion of the packet header is selected based upon a 
packet class with which the packet header is associated. 

23 Claims, 9 Drawing Sheets 
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SEARCH ENGINE ARCHITECTURE FOR A mined portion of the packet header is selected based upon a 

HIGH PERFORMANCE MULTI-LAYER packet class with which the packet header is associated. 

SWITCH ELEMENT Other features of the present invention will be apparent 

from the accompanying drawings and from the detailed 

FIELD OF THE INVENTION 5 description which follows. 

The invention relates generally to the field of computer BRIEF DESCRIPTION OF THE DRAWINGS 

networking devices. More particularly, the invention relates „ . ^ • n * . j i_ c * 

,^1 . r , . ... ^ The present invention is illustrated by way of example, 

to a multi-layer switch search engine architecture. j 5i_ . . - . ~ r 

J and not by way of limitation, in the figures of the accom- 

BACKGROUND OF THE INVENTION 10 P^y^S drawings and in which like reference numerals refer 

to similar elements and in which: 

Local area networks (LANs) have become quite sophis- FIG. 1 illustrates a switch according to one embodiment 

ticated in architecture. Originally, LANs were thought of a 0 f the present invention. 

single wire connecting a few computers. Today LANs are FIG. 2 is a simplified block diagram of an exemplary 

implemented in complicated configurations to enhance fane- 15 switch clement mat may be utili2ed m the switch of nG x 

tionality and flexibility. In such a network, packets are nG 3 fc g ^ dj Qf ^ %M ^ q{ flQ 2 

transmitted from a source device to a destination device; in j- . , ? . t f , . 

, . . . . . , ' accordmg to one embodiment or the present invention, 

more expansive networks, this packet can travel through one t . . . t 

or more switches and/or routers. Standards have been set to FIG - 4 mustr f tcs ^ portions of a generic packet header 

define the packet structure and layers of functionality and 20 that ^are operated jupon by the pipelined header preprocessing 

sophistication of a network. For example, the TCP/IP pro- subblocks of FIG. 5 accordmg to one embodiment of the 

tocol stack defines four distinct multiple layers, e.g. the P™*^ invention. 

physical layer (layer 1), data link layer (layer 2), network FIG - 5 illustrates pipelined header preprocessing sub- 
layer (layer 3), transport layer (layer 4). A network device blocks of header processing logic of FIG. 3 according to 
may be capable of supporting one or more of the layers and 25 one embodiment of the present invention, 
refer to particular fields of the header accordingly. FIG. 6 illustrates a physical organization of the forward- 
Today, typical LANs utilize a combination of Layer 2 m S memory of FIG 2 according to one embodiment of the 
(data link layer) and Layer 3 (network layer) network present invention. 

devices. In order to meet the ever increasing performance FIG. 7 is a flow diagram illustrating the forwarding 
demands from the network, functionality that has been 30 database memory search supercycle decision logic accord- 
traditionally performed in software and/or in separate layer m g t0 one embodiment of the present invention. 
2 and layer 3 devices have migrated into one multi-layer FIGS. 8A-C are timing diagrams illustrating three exem- 
device or switch that implements the performance critical plary forwarding database memory search supercycles. 
functions in hardware. FIG. 9 is a flow diagram illustrating generalized corn- 
One of the critical aspects for achieving a cost-effective 35 mand processing for typical forwarding database memory 
high-performance switch implementation is the architecture access commands according to one embodiment of the 
of the forwarding database search engine, which is the present invention. 

centerpiece of every switch design. Therefore, it is desirable DETAILED DESCRIPTION 
to optimize partitioning of the functional modules, provide 

efficient interaction between the search engine and its "cli- 40 A scarch cn S ine architecture for a high performance 

ents" (e.g., switch input ports and the central processing multi-layer switch element is described. In the following 

unit), and optimize the execution order of events, all of description, for the purposes of explanation, numerous spe- 

which play a crucial role in the overall performance of the are Mt forth m ordcr to P r0Vldc a thorough 

switching fabric. Also, it is desirable to support diverse understanding of the present invention. It will be apparent, 

traffic types and policies by providing flexibility to match 4S howcver > to one * the art that the present invention 

different packet header fields. Ideally this architecture mav bc P^ced without some of these specific details. In 

should also allow for a very high level of integration in othcr instances, well-known structures and devices are 

silicon, and linearly scale in performance with the advances shown m block diagram form. 

in silicon technology. The present invention includes various steps, which will 

50 be described below. While the steps of the present invention 

SUMMARY OF THE INVENTION are preferably performed by the hardware components 

described below, the steps may alternatively be embodied in 

A multi-layer switch search engine architecture is machine-executable instructions, which may be used to 

described. According to one aspect of the present invention, causc a general-purpose or special-purpose processor pro- 

a switch fabric includes a search engine, and a packet header 55 grammc d with the instructions to perform the steps. Further, 

processing unit. The search engine may be coupled to a embodiments of the present invention will be described with 

forwarding database memory and one or more mput ports. reference to a high speed Ethernet switch employing a 

The search engine is configured to schedule and perform combination of random access memory (RAM) and content 

accesses to the forwarding database memory and to transfer addressable memories (CAMs). However, the method and 

forwarding decisions to the one or more input ports. The 60 ap p ara tus described herein are equally applicable to other 

header processing unit is coupled to the search engine and typcs of nctwork devices such as repeaters, bridges, routers, 

includes an arbitrated interface for coupling to the one or broutcrs, and other network devices and also alternative 

more input ports. The header processing unit is configured to me mory types and arrangements, 
receive a packet header from one or more of the input ports 

and is further configured to construct a search key for 65 ^ EXEMPLARY NETWORK ELEMENT 

accessing the forwarding database memory based upon a An overview of one embodiment of a network element 

predetermined portion of the packet header. The predeter- that operates in accordance with the teachings of the present 
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invention is illustrated in FIG. 1. The network element is 
used to interconnect a number of nodes and end-stations in 
a variety of different ways. In particular, an application of 
the multi-layer distributed network element (MLDNE) 
would be to route packets according to predefined routing 5 
protocols over a homogenous data link layer such as the 
IEEE 802.3 standard, also known as the Ethernet. Other 
routing protocols can also be used. 

The MLDNE' s distributed architecture can be configured 
to route message traffic in accordance with a number of 10 
known or future routing algorithms. In a preferred 
embodiment, the MLDNE is configured to handle message 
traffic using the Internet suite of protocols, and more spe- 
cifically the Transmission Control Protocol (TCP) and the 
Internet Protocol (IP) over the Ethernet LAN standard and 15 
medium access control (MAC) data link layer. The TCP is 
also referred to here as a Layer 4 protocol, while the IP is 
referred to repeatedly as a Layer 3 protocol 

In one embodiment of the MLDNE, a network element is 
configured to implement packet routing functions in a dis- 20 
tributed manner, i.e., different parts of a function are per- 
formed by different subsystems in the MLDNE, while the 
final result of the functions remains transparent to the 
external nodes and end-stations. As will be appreciated from 
the discussion below and the diagram in FIG. 1, the MLDNE 25 
has a scalable architecture which allows the designer to 
predictably increase the number of external connections by 
adding additional subsystems, thereby allowing greater flex- 
ibility in defining the MLDNE as a stand alone router. 

As illustrated in block diagram form in FIG. 1, the 30 
MLDNE 101 contains a number of subsystems 110 that are 
fully meshed and interconnected using a number of internal 
links 141 to create a larger switch. At least one internal link 
couples any two subsystems. Each subsystem 110 includes 
a switch element 100 coupled to a forwarding and filtering 35 
database 140, also referred to as a forwarding database. The 
forwarding and filtering database may include a forwarding 
memory 113 and an associated memory 114. The forwarding 
memory (or database) 113 stores an address table used for 
matching with the headers of received packets. The associ- 40 
ated memory (or database) stores data associated with each 
entry in the forwarding memory that is used to identify 
forwarding attributes for forwarding the packets through the 
MLDNE. A number of external ports (not shown) having 
input and output capability interface the external connec- 45 
lions 117. In one embodiment, each subsystem supports 
multiple Gigabit Ethernet ports, Fast Ethernet ports and 
Ethernet ports. Internal ports (not shown) also having input 
and output capability in each subsystem couple the internal 
links 141. Using the internal links, the MLDNE can connect 50 
multiple switching elements together to form a multigigabit 
switch. 

The MLDNE 101 further includes a central processing 
system (CPS) 160 that is coupled to the individual sub- 
system 110 through a communication bus 151 such as the 55 
peripheral components interconnect (PCI). The CPS 160 
includes a central processing unit (CPU) 161 coupled to a 
central memory 163. Central memory 163 includes a copy of 
the entries contained in the individual forwarding memories 
113 of the various subsystems. The CPS has a direct control 60 
and communication interface to each subsystem 110 and 
provides some centralized communication and control 
between switch elements. 

AN EXEMPLARY SWITCH ELEMENT 65 

FIG. 2 is a simplified block diagram illustrating an 
exemplary architecture of the switch element of FIG. 1. The 
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switch element 100 depicted includes a central processing 
unit (CPU) interface 215, a switch fabric block 210, a 
network interface 205, a cascading interface 225, and a 
shared memory manager 220. 

Ethernet packets may enter or leave the network switch 
element 100 through any one of the three interfaces 205, 
215, or 225. In brief, the network interface 205 operates in 
accordance with a corresponding Ethernet protocol to 
receive Ethernet packets from a network (not shown) and to 
transmit Ethernet packets onto the network via one or more 
external ports (not shown). An optional cascading interface 
225 may include one or more internal links (not shown) for 
interconnecting switching elements to create larger 
switches. For example, each switch element 100 may be 
connected together with other switch elements in a full mesh 
topology to form a multi- layer switch as described above. 
Alternatively, a switch may comprise a single switch ele- 
ment 100 with or without the cascading interface 225. 

The CPU 161 may transmit commands or packets to the 
network switch clement 100 via the CPU interface 215. In 
this manner, one or more software processes running on the 
CPU 161 may manage entries in an external forwarding and 
filtering database 140, such as adding new entries and 
invalidating unwanted entries. In alternative embodiments, 
however, the CPU 161 may be provided with direct access 
to the forwarding and filtering database 140. In any event, 
for purposes of packet forwarding, the CPU port of the CPU 
interface 215 resembles a generic input port into the switch 
element 100 and may be treated as if it were simply another 
external network interface port. However, since access to the 
CPU port occurs over a bus such as a peripheral components 
interconnect (PCI) bus, the CPU port does not need any 
media access control (MAC) functionality. 

Returning to the network interface 205, the two main 
tasks of input packet processing and output packet process- 
ing will now briefly be described. Input packet processing 
may be performed by one or more input ports of the network 
interface 205. Input packet processing includes the follow- 
ing: (1) receiving and verifying incoming Ethernet packets, 
(2) modifying packet headers when appropriate, (3) request- 
ing buffer pointers from the shared memory manager 220 for 
storage of incoming packets, (4) requesting forwarding 
decisions from the switch fabric block 210, (5) transferring 
the incoming packet data to the shared memory manager 220 
for temporary storage in an external shared memory 230, 
and (5) upon receipt of a forwarding decision, forwarding 
the buffer pointers) to the output port(s) indicated by the 
forwarding decision. Output packet processing may be per- 
formed by one or more output ports of the network interface 
205. Output processing includes requesting packet data from 
the shared memory manager 220, transmitting packets onto 
the network, and requesting deallocation of buffer(s) after 
packets have been transmitted. 

The network interface 205, the CPU interface 215, and the 
cascading interface 225 are coupled to the shared memory 
manager 220 and the switch fabric block 210. Preferably, 
critical functions such as packet forwarding and packet 
buffering are centralized as shown in FIG. 2. The shared 
memory manager 220 provides an efficient centralized inter- 
face to the external shared memory 230 for buffering of 
incoming packets. The switch fabric block 210 includes a 
search engine and learning logic for searching and main- 
taining the forwarding and filtering database 140 with the 
assistance of the CPU 161. 

The centralized switch fabric block 210 includes a search 
engine that provides access to the forwarding and filtering 
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database 140 on behalf of the interfaces 205, 215, and 225. (3) Hdr_BuspCl][N:0]— The Dedicated Header Bus 

Packet header matching, Layer 2 based learning, Layer 2 The header bus is a dedicated X-bit wide bus from each 

and Layer 3 packet forwarding, filtering, and aging are mpul port t0 the S vn\cb fabric 210. In one embodiment, X is 

exemplary functions that may be performed by the switch 16> mereby allowing the packet header to be transferred as 

fabric block 210. Each input port is coupled with the switch 5 d^ie bytes 

fabric block 210 to receive forwarding decisions for /JX ^ . ' , rvT „ _ . . 

received packets. The forwarding decision indicates the < 4 > Fwd^ C yN:0}-Forwarding Decision Acknowledg- 

outbound port(s) (e.g., external network port or internal ment 5>1 S nals 

cascading port) upon which the corresponding packet should forwarding decision acknowledgment signals are 
be transmitted. Additional information may also be included 10 generated by the switch fabric 210 in response to corre- 
in the forwarding decision to support hardware routing such sponding forwarding request signals from the input ports 
as a new MAC destination address (DA) for MAC DA ( see Fwd_Req[N:0] above). These signals are deasserted 
replacement Further, a priority indication may also be wmle the forwarding decision is not ready. When a forward- 
included in the forwarding decision to facilitate prioritiza- m 8 decision acknowledgment signal does become asserted, 
tion of packet traffic through the switch element 100. 15 me corresponding input port should assume the forwarding 
In the present embodiment, Ethernet packets are centrally ? ecisi ° n bus , ( see Fwd Decision[Y:0] below) has a valid 
buffered and managed by the shared memory manager 220. forwarding decision After detecting its forwarding decision 
The shared memory manager 220 interfaces every input port acknowledgment, the corresponding input port may make 
and output port and performs dynamic memory allocation another forwa ^g request, if needed, 
and deallocation on their behalf, respectively. During input 20 & F wd -Decision[Y:0]— Shared Forwarding Decision 
packet processing, one or more buffers are allocated in the Bus 

external shared memory 230 and an incoming packet is This forwarding decision bus is shared by all input ports, 

stored by the shared memory manager 220 responsive to It indicates the output port numbers) on which to forward 

commands received from the network interface 205, for the packet. The forwarding decision may also include data 

example. Subsequently, during output packet processing, the 25 indicative of the outgoing packet's priority, V1D insertion, 

shared memory manager 220 retrieves the packet from the DA replacement, and other information that may be useful 

external shared memory 230 and deallocates buffers that are to the input ports, 

no longer in use. To assure no buffers are released until all cwrTPH OVFRVTPW 

output ports have completed transmission of the data stored dWllul * ABKK - u vukviu w 

therein, the shared memory manager 220 preferably also 30 Having described the interface between the input ports 

tracks buffer ownership. and the switch fabric 210, the internal details of the switch 

INPUT PORT/SWITCH FABRIC INTERFACE ff™ 2 , 10 wfll D ° W be descr * cd u «' •" 

block diagram of an exemplary switch fabric 210 is 

Before describing the internal details of the switch fabric depicted. In general, the switch fabric 210 is responsible for 

210, the interface between the input ports (e.g., any port on 35 directing packets from an input port to an output port. The 

which packets may be received) and the switch fabric 210 goa i 0 f mc switch fabric 210 is to generate forwarding 

will now briefly be discussed. Input ports in each of the CPU decisions to the input ports in the shortest time possible to 

interface 215, the network interface 205, and the cascading keep the delay though the switch low and to achieve wire 

interface 225 request forwarding decisions for incoming speed switching on all ports. The primary functions of the 

packets from the switch fabric 210. According to one ^ switch fabric are performing real-time packet header 

embodiment of the present invention, the following interface matching, Layer 2 (L2) based learning, L2 and Layer 3 (L3) 

is employed: aging, forming L2 and L3 search keys for searching and 

(1) Fwd_Req[N;0]— Forward Request Signals retrieving forwarding information from the forwarding data- 
These forward request signals are output by the input base memory 140 on behalf of the input ports, and providing 

ports to the switch fabric 210. They have two purposes. First, 45 a command interface for software to efficiently manage 
they serve as an indication to the switch fabric 210 that the entries in the forwarding database memory 140. 
corresponding input port has received a valid packet header Layer 2 based learning is the process of constantly 
and is ready to stream the packet header to the switch fabric. updating the MAC address portion of the forwarding data- 
A header transfer grant signal (see Hdr__Xfx_Gnt[N:0] base 140 based on the traffic that passes through the switch- 
below) is expected to be asserted before transfer of the 50 ing device. When a packet enters the switching device, an 
packet header will begin. Second, these signals serve as a entry is created (or an existing entry is updated) in the 
request for a forwarding decision after the header transfer database that correlates the MAC source address (SA) of the 
grant is detected. The forward request signals are deasserted packet with the input port upon which the packet arrived. In 
in the clock period after a forwarding decision acknowledg- this manner, a switching device "leams" on which subnet a 
ment is detected from the switch fabric 210 (see Fwd_Ack 55 node resides. 

[N:0] below). Aging is carried out on both link and network layers. It is 

(2) Hdr_Xfr_Gnt[N:0] — Header Transfer Grant Signals the process of time stamping entries and removing expired 
These header transfer grant signals are output by the entries from the forwarding database memory 140. There are 

switch fabric 210 to the input ports. More specifically, these two types of aging: (1) aging based on MAC SA, and (2) 

signals are output by the switch fabric's header preprocess- 60 aging based on MAC destination address (DA). The former 

ing logic that will be described further below. At any rate, is for Layer 2 aging and the latter aids in removal of inactive 

the header transfer signal indicates the header preprocessing Layer 3 flows. Thus, aging helps reclaim inactive flow space 

logic is ready to accept the packet header from the corre- for new flows. At predetermined time intervals, an aging 

sponding input port Upon detecting the assertion of the field is set in the forwarding database entries. Entries that are 

header transfer grant, the corresponding input port will 65 found during MAC SA or MAC DA searching will have 

begin streaming continuous header fields to the switch fabric their aging fields cleared. Thus, active entries will have an 

210. aged bit set to zero, for example. Periodically, software or 
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hardware may remove the inactive (expired) entries from the 
forwarding database memory 140; thereby allowing for 
more efficient database management Aging also enables 
connectivity restoration to a node that has "moved and kept 
silent" since it was learned. Such a node can only be reached 5 
through flooding. 

Before discussing the exemplary logic for performing 
search key formation, the process of search key formation 
will now briefly be described. According to one embodiment 
of the present invention, packets are broadly categorized in 1Q 
one of two groups, either L2 entries or L3 entries. The L3 
entries may be further classified as being part of one of 
several header classes. Exemplary header classes include: 
(1) an Address Resolution Protocol (ARP) class indicating 
the packet header is associated with an ARP packet; (2) a 
reverse ARP (RARP) class indicating the packet header is 
associated with a RARP packet; (3) a PIM class indicating 
the packet header is associated with a PIM packet; (4) a 
Reservation Protocol (RS VP) class indicating the packet 
header is associated with an RSVP packet; (5) an Internet ^ 
Group Management Protocol (IGMP) class indicating the 
packet header is associated with a IGMP packet; (6) a 
Transmission Control Protocol (TCP) flow class indicating 
the packet header is associated with a TCP packet; (7) a 
non-fragmented User Datagram Protocol (UDP) flow class 
indicating the packet header is associated with a non- 
fragmented UDP packet; (8) a fragmented UDP flow class 
indicating the packet header is associated with a fragmented 
UDP packet; (9) a hardware routable Internet Protocol (IP) 
class indicating the packet header is associated with a 
hardware routable IP packet; and (10) an IP version six (IP 
V6) class indicating the packet header is associated with an 
IP V6 packet. 

In one embodiment of the present invention, search keys 
are formed based upon an encoding of the header class and 35 
selected information from the incoming packet's header. L2 
search keys may be formed based upon the header class, the 
12 address and the VI D. L3 search keys may be formed 
based upon the header class, an input port list, and selectable 
L3 header fields based upon the header class, for example. ^ 
Masks may be provided on a per header class basis in local 
switch element 100 memory to facilitate the header field 
selection, in one embodiment. 

In the embodiment depicted in FIG. 3, the switch fabric 
210 includes a header preprocess arbiter 360, packet header 45 
preprocessing logic 305, a search engine 370, learning logic 
350, a software command execution block 340, and a 
forwarding database memory interface 310. 

The header preprocess arbiter 360 is coupled to the packet 
header preprocessing logic 305 and to the input ports of the 50 
network interface 205, the cascading interface 225, and the 
CPU interface 215. The input ports transfer packet headers 
to the switch fabric 210 and request forwarding decisions in 
the manner described above, for example. 

The switch fabric 210 may support mixed port speeds by 55 
giving priority to the faster network links. For example, the 
header preprocess arbiter 360 may be configured to arbitrate 
between the forwarding requests in a prioritized round robin 
fashion giving priority to the faster interfaces by servicing 
each fast interface (e.g., Gigabit Ethernet port) for each N eo 
slower interfaces (e.g., Fast Ethernet ports). 

Upon selecting a forward request to service, the header 
preprocess arbiter 360 transfers the corresponding packet 
header to the header preprocess logic 305. The header 
preprocessing logic 305 performs L2 encapsulation filtering 65 
and alignment, and L3 header comparison and selection 
logic. 
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The search engine 370 is coupled to the forwarding 
database memory interface 310 for making search requests 
and to the header preprocessing logic 305 for information 
for generating search keys. The search engine 370 is also 
coupled to the learning logic 350 to trigger the learning 
processing. The search engine 370 contains logic for sched- 
uling and performing accesses into the forwarding database 
memory 140 and executes the forward and filter algorithm 
including performing search key formation, merging L2 and 
L3 results retrieved from the forwarding database memory 
140, filtering, and generating forwarding decisions to the 
requesting input ports, etc. For purposes of learning, updated 
forwarding database entry information such as a cleared age 
bit or a modified output port list, is provided by the learning 
logic 350 at the appropriate time during the searching cycle 
for update of the forwarding database memory 140. Finally, 
as will be discussed further below, when search results 
become available from the forwarding database memory 
140, the search engine 370 generates and transfers a for- 
warding decision to the requesting input port. 

The forwarding database memory interface 310 accepts 
and arbitrates access requests to the forwarding database 
memory 140 from the search engine 370 and the software 
command execution block 340. 

The software command execution block 340 is coupled to 
the CPU bus. Programmable command, status, and internal 
registers may be provided in the software command execu- 
tion block 340 for exchanging information with the CPU 
161. Importantly, by providing a relatively small command 
set to the CPU, the switch fabric 210 shields the CPU from 
the tens or hundreds of low-level instructions that may be 
required depending upon the forwarding database memory 
implementation. For example, in an architecture providing 
the CPU with direct access to a content addressable memory, 
for example, a great deal of additional software would be 
required to access the forwarding database memory. This 
additional software would be unnecessarily redundant, in 
fight of the fact that the switch fabric 210 already has 
knowledge of the forwarding database memory 140 inter- 
face. 

■ Additional efficiency considerations are also addressed by 
the present invention with respect to architectures having 
distributed forwarding databases. For example, in a distrib- 
uted architecture, it may be desirable to keep an image of the 
entire forwarding database in software. If this is the case, 
presumably, periodically the software will need to read all 
entries from each of the individual forwarding databases. 
Since the forwarding database(s) may be very large, many 
inefficient programmed input/outputs (PIOs) may be 
required by an architecture providing the CPU with direct 
access to the forwarding database(s). 

Thus, it would be advantageous to employ the switch 
fabric 210 as an intermediary between the CPU 161 and the 
forwarding database 140 as discussed herein. According to 
one embodiment of the present invention, the software 
command execution block 340 may provide a predetermined 
set of commands to the software for efficient access to and 
maintenance of the forwarding database memory 140. The 
predetermined set of commands described below have been 
defined in such a way so as to reduce overall PIOs. These 
commands as well as the programmable registers will be 
discussed in further detail below. 

An exemplary set of registers includes the following: (1) 
a command and status register for receiving commands from 
the CPU 161 and indicating the status of a pending com- 
mand; (2) a write new entry register for temporarily storing 
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a new entry to be written to the forwarding database 140; (3) 510 may be processing the 12 header portion 475 of a packet 

a write key register for storing the key used to locate the from a first input port, the encapsulation block 520 may be 

appropriate forwarding database entry; (4) a write data processing the L2 encapsulation portion 480 of a packet 

register for storing data to be written to the forwarding from a second input port, the L3 header class matching block 

database 140; (5) an address counter register for storing the 5 530 may be processing the L3 address independent portion 

location in the forwarding database memory to read or 485 of a third input port, and the L3 address dependent block 

update; (6) a read entry register for storing the results of a 540 may be processing the 13 address dependent portion 

read entry operation; and (7) a read data register for storing 490 of a packet from a forth input port, 
the results of other read operations. Importantly, while the present embodiment is illustrated 

In one embodiment of the present invention, an address 10 with reference to four pipeline stages, it is appreciated that 
counter register is used to facilitate access to the forwarding more or less stages may be employed and different group- 
database memory 140. The software only needs to program ings of packet header information may be used. The present 
the address register with the start address of a sequence of identification of header portions depicted in FIG. 4 has been 
reads/writes prior to the initial read/write of the sequence. selected for convenience. The boundaries for these header 
After the initial memory access, the address register will be 15 portions 475-490 are readily identifiable based upon known 
automatically incremented for subsequent accesses. characteristics of the fields within each of the exemplary 
Advantageously, in this manner, additional PIOs are saved, header portions 475-490. Further, the header portions 
because the software is not required to update the address 475-490 can be processed in approximately equal times, 
prior to each memory access. In my evcntj continuing with the present example, the 

The software command execution block 340 is further 20 arbiters 501-504 coordinate access to the stages of the 

coupled to the forwarding database memory interface 310. pipeline. The arbiters 501-504 function so as to cause a 

Commands and data are read from the programmable reg- given packet to be sequentially processed one stage at a time 

isters by the software command execution block 340 and starting with the address accumulation block 510 and ending 

appropriate forwarding database memory access requests with the L3 address dependent block 540, The first stage of 

and events are generated as described in further detail with 25 the pipeline, the address accumulation block 510, is config- 

reference to FIG. 9. The software command execution block ured to extract the MAC SA and MAC DA from the L2 

340 may also provide status of the commands back to the header portion 475 of the packet header. The address accu- 

software via status registers. In this manner, the software mulation block 510 then transfers the extracted information 

command execution block 340 provides hardware assisted to the search engine for use as part of the 12 search key 545. 
CPU access to the forwarding database memory 140. 30 Th e encapsulation block 520 is configured to determine 

PACKET HEADER PROCESSING SSiT^ OT ?^° n °/ ^capsulation portion 

480 or the packet header. As indicated above, the relative 

FIG. 4 illustrates the portions of a generic packet header positioning of fields following the L2 encapsulation portion 
that are operated upon by the pipelined header preprocessing 35 varies depending upon the type of encapsulation employed, 

subblocks of FIG. 5 according to one embodiment of the Therefore, the encapsulation block further calculates an 

present invention. According to this embodiment, a packet offset from the start of the L2 encapsulation portion 480 to 

header 499 is partitioned into four portions, an L2 header the start of the L3 address independent portion 485. The 

portion 475, an L2 encapsulation portion 480, an L3 address oflset may then be used by the subsequent stages to align the 
independent portion 485, and an L3 address dependent ^ packet header appropriately. 

portion 490. The L3 header class matching block 530 is configured to 

In this example, the L2 header portion 475 may comprise determine the class of the L3 header by comparing the 

a MAC SA field and a MAC DA field. Depending upon the packet header to a plurality of programmable registers that 

type of encapsulation (e.g., IEEE 802. 1Q tagged or LLC- may contain predetermined values known to facilitate iden- 

SNAP), the L2 encapsulation portion may include a virtual 45 tification of the L3 header class. Each programmable register 

local area network (VLAN) tag or an 802.3 type/length field should be set such that only one header class will match for 

and an LLC SNAP field. The L3 address independent any given packet. Once a given register has been determined 

portion 485 may comprise an IP flags/fragment offset field to match, a class code is output to the search engine for use 

and a protocol field. Finally, the L3 address dependent as part of the L3 search key. 

portion 490 may comprise an IP source field, an IP desti- 50 The L3 address dependent block 540 is configured to 

nation field, a TCP source port, and a TCP destination port. extract appropriate bytes of the L3 address dependent por- 

Note that the relative position of fields in the L3 address tion 490 for use in the L3 search key 555. This extraction 

independent portion 485 and the 13 address dependent may be performed by employing M CPU programmable 

portion 490 may be different depending upon the type of byte and bit masks, for example. The programmable byte 

encapsulation in the L2 encapsulation portion 480. ss ^ bit mask corresponding to the header class, determined 

FIG. 5 illustrates pipelined header preprocessing sub- by the L3 header class matching block 530, may be used to 

blocks according to one embodiment of the present inven- mask off the desired fields. Advantageously, pipelining the 

tion. According to this embodiment, the header preprocess- header preprocess logic 305 saves hardware implementation 

ing logic 305 may be implemented as a four stage pipeline. overhead. For example, multiple packet headers may be 

Each stage in the pipeline operates on a corresponding 60 processed simultaneously in a single processing block rather 

portion of the packet header 499. The pipeline depicted man four processing blocks that would typically be required 

includes four stage arbiters 501-504, an address accumula- to implement the logic of FIG. 5 in a non-pipelined fashion, 

tion block 510, an encapsulation block 520, an L3 header Note that additional parallelism may be achieved by, further 

class matching block 530, and an L3 address dependent pipelining the above header preprocessing with forwarding 

block 540. In this example, the header preprocessing logic 65 database memory 140 accesses. For example, there is no 

305 may simultaneously process packet headers from four need for L2 searching to wait for a packet to complete the 

input ports. For example, the address accumulation block pipeline of FIG. 5, L2 searches may be initiated as soon as 
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a packet header completes the first stage and an L2 search 
key becomes available from the search engine 370. Subse- 
quent L2 searches may be initiated as new L2 search keys 
become available and after the previous forwarding database 
memory access has completed. 5 

FORWARDING DATABASE MEMORY 

FIG. 6 illustrates a physical organization of the forward- 
ing database memory of FIG. 2 according to one embodi- 
ment of the present invention. In the embodiment depicted, 
the forwarding database memory 140 includes two cascaded 10 
fully associative content addressable memories (CAMs), 
610 and 620, and a static random access memory (SRAM) 
630. 

The switch fabric 210, in collaboration with the CPU 161, 
maintains a combined link layer (also referred to as "Layer 15 
2") and network layer (also referred to as "Layer 3") packet 
header field-based forwarding and filtering database 140. 
The forwarding and filtering database 140 is stored primarily 
in off-chip memory (e.g., one or more CAMs and SRAM) 
and contains information for making real-time packet for- 20 
warding and filtering decisions. 

The assignee of the present invention has found it advan- 
tageous to physically group Layer 2 (L2) entries and Layer 
3 (L5) entries together. Therefore, at times the group of L2 
entries may be referred to as the "L2 database" and the group 2s 
of L3 entries may be logically referred to as the "L3 
database." However, it is important to note that the L2 
database and L3 database may span CAMs. That is, either 
CAM may contain L2 and/or L3 entries. Both Layer 2 and 
Layer 3 forwarding databases are stored in the CAM -RAM 30 
chip set. For convenience, the data contained in the CAM 
portion of the forwarding database memory 140 will be 
referred to as "associative data," while the data contained in 
the SRAM portion of the forwarding database memory 140 
will be referred to as "associated data." 35 

As will be explained further below, entries may be 
retrieved from the L2 database using a key of a first size and 
entries may be retrieved from the L3 database using a key of 
a second size. Therefore, in one embodiment, the switching 
element 100 may mix CAMs of different widths. Regardless 40 
of the composition of the forwarding database memory 140, 
the logical view to the switch fabric 210 and the CPU 161 
should be a contiguous memory that accepts bit match 
operations of at least two different sizes, where all or part of 
the memory is as wide as the largest bit match operation. 45 

Different combinations of CAMs are contemplated. 
CAMs of different widths, and different internal structures 
(e.g., mask per bit (MPB) vs. global mask) may be 
employed. In some embodiments, both CAMs 610 and 620 
may be the same width, while in other embodiments the 50 
CAMs 610 and 620 may have different widths. For example, 
in one embodiment, both CAMs 610 and 620 may be 
128-bits wide and 2K deep or the first CAM 610 may be 
128-bits wide and the second CAM 620 may be 64-bits 
wide. Since L2 entries are typically narrower than L3 55 
entries, in the mixed CAM width embodiments, it may be 
advantageous to optimize the narrower CAM width for L2 
entries. In this case, however, only L2 entries can be stored 
in the narrower CAM, However, both L2 and L3 entries may 
still reside in the wider CAM. eo 

While the present embodiment has been described with 
reference to cascaded dual CAMs 610 and 620, because the 
logical view is one contiguous block, it is appreciated that 
the L2 and L3 databases may use more or less CAMs than 
depicted above. For example, the L2 and L3 databases may 65 
be combined in a single memory in alternative embodi- 
ments. 
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Having described an exemplary physical organization of 
the forwarding database memory 140, the data contained 
therein will now briefly be described. One or more lines of 
the SRAM 630 may be associated with each entry in the 
CAM portion. It should be noted that a portion of the CAM 
could have been used as RAM. However, one of the goals 
of partitioning the associative data and the associated data is 
to produce a minimum set of associative data for effective 
searching while storing the rest of the associated data in a 
separate memory, a cheaper RAM, for example. As will be 
discussed below, with respect to FIGS. 8A-C, separating the 
associative data and the associated data allows the forward- 
ing database memory 140 to be more efficiently searched 
and updated. Additional advantages are achieved with an 
efficient partitioning between associative data and associ- 
ated data. For example, by minimizing the amount of data in 
the associative data fields, less time and resources are 
required for access and maintenance of the forwarding 
database such as the occasional shuffling of L3 entries that 
may be performed by the CPU 161. Additionally, the effi- 
cient partitioning reduces the amount of time required for 
the occasional snap shots that may be taken of the entire 
forwarding database for maintenance of the aggregate copy 
of forwarding databases in the central memory 163. 

Generally, the associative data is the data with which the 
search key is matched. Packet address information is typi- 
cally useful for this purpose. In one embodiment, the asso- 
ciative data may contain one or more of the following fields 
depending upon the type of entry (e.g., L2 or L3): 

(1) a class field indicating the type of associative entry; 

(2) a media access control (MAC) address which can be 
matched to an incoming packet's MAC DA or S A field; 

(3) a virtual local area network (VLAN) identifier (VID) 
field 

(4) an Internet Protocol (IP) destination address; 

(5) an IP source address; 

(6) a destination port number for TCP or non-fragmented 
UDP flows; 

(7) a source port number for TCP or non-fragmented UDP 
flows; and 

(8) an input port list for supporting efficient multicast 
routing. 

The associative data may also contain variable bits of the 
above by employing a mask per bit (MPB) CAM as 
described above. 

The associated data generally contains information such 
as an indication of the output port(s) to which the packet 
may be forwarded, control bits, information to keep track of 
the activeness of the source and destination nodes, etc. Also, 
the associated data includes the MAC address for MAC DA 
replacement and the VID for tagging. Specifically, the 
associated data may contain one or more of the following 
fields: 

(1) a port mask indicating the set of one or more ports the 
packet may be forwarded to; 

(2) a priority field for priority tagging and priority queu- 
ing. 

(3) a best effort mask indicating which ports should queue 
the packet as best effort; 

(4) a header only field indicating that only the packet 
header should be transferred to the CPU; 

(5) a multicast route field for activating multicast routing; 

(6) a next hop destination address field defining the next 
hop L2 DA to be used to replace the original DA; 
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(7) a new VID field that may be used as a new tag for the Specific processing for packets associated with headers 
packet when routing between VLANs requires an out- classified as L3 route includes steps 726, 732, 734, 736, 748, 
going tag different than the incoming tag, for example; 750, 754, 756, 752, 758, and 760. At step 726, an L3 search 

(8) a new tag field indicating that the new VID field * Performed on the forwarding database 140. If a matching 
should be used- 5 ^ entry is found (step 732), then the associated data 

, v , ' . . . corresponding to the matching entry is read from the for- 

(9) an aged source indication for determining which L2 wafding database 140 (step 736) otherwise, at step 734, the 
entries are active in the forwarding database, and which class action opti ons ^ applied and processing continues 
may be removed; with step 7g0 . 

(10) an aged destination indication for implementing 1Q If the packet is a multicast packet (step 748), then the 
IEEE 802. Id type address aging to determine which 12 Time„To__Live (TTL) counter is tested against zero or one 
or L3 entries are active in the forwarding database, and (step 750), otherwise processing continues at step 752. If 
which may be removed. TTL was determined to be zero or one, in step 750, then the 

(11) an 12 override indication for instructing the merge P acket fa forwarded to the CPU 161 prior to continuing with 
function to use the 12 result for forwarding even when . , ste P 780 Otherwise, at step 754, a destination address search 
an L3 result is available; * P erformcd to retrieve an 12 forwarding entry from the 

* * ■ . « ■ . • - . , t t . . iL forwarding database 140 and the L2 decision algorithm is 

(12) a static indication for identifying static entries in the applied Csteo 756} 

forwarding database that are not subject to automatic i * j « * 1 * • . 

T - 1 . & ' If the packet was determined to be a unicast packet in step 

Ufi learning or aging; ^ ^ fe tested again$t ^ of ong (gtep ?52) If ^ 

(13) a distributed flow indication for use over internal 20 was determined to be zero or one, then the packet is 
(cascading) links to control the type of matching cycle forwarded to the CPU 161. Otherwise the L3 match is 
(L2 or L3) used on the next switching element; and employed at step 760 and processing continues with step 

(14) a flow rate count for estimating the arrival rate of an 780. 

entry or group of entries. Specific processing for packets associated with headers 

25 classified as L3 includes steps 730, 740, 742, 762, 764, 766, 

FORWARDING DATABASE SEARCH 744, 746, 768, and 770. At step 730, an L3 search is 

SUPERCYCLE DECISION FLOW requested from the forwarding database 140. If a matching 

FIG. 7 is a flow diagram illustrating the forwarding U * found u < sle P 7 40 )> ten , me ^° ciate d 

database memory search supercycle decision logic accord- responding to the matching entry is read from the for- 

mgt O ODeembodimentofthe P resentinvention.Atste P 702, 30 wa ^ g d r ^ T V ? I }' ^^ni , n ° 
A A - u #u a. i ♦ • u • j matching L3 entry is found, at step 742 a DA search is 
depenchng upon whether the packet is being received on an fonn * d t0 fin / a matchmg u ^ in the forwarding 
internal link or an external link, processing continues with database 140 & 
step 704 or step 706 respectively. Jf ^ forwarding dedsion indicates the u decisioQ 
Internal link specific processing includes steps 704, 712, 3s sho uld be used (step 762), then the 12 decision algorithm is 
714, 720, 722, and 724. At step 704, since the packet has applied at step 770. Otherwise, the class action options are 
been received from an internal link, a check is performed to applied (step 764). If the class action options indicate the 
determine if the packet is part of a distributed flow. If so, packet is to be forwarded using the L2 results (step 766), 
processing continues with step 714. If the packet is not part then processing continues at step 770. Otherwise, the pro- 
of a distributed flow, then processing continues with step cessing branches to step 780. 

712. At step 746, a destination address search is performed on 

No learning is performed for the internal links, therefore, foe forwarding database 140 using the packet's destination 

at step 712, only a DA search is performed on the forwarding address. If the forwarding decision indicates the L2 decision 

database memory 140 should be used (step 768), then processing continues with 

At step 714, an L3 search is performed to retrieve a 4S ste P 7 70 " 0t f e ™ se -« he asso< : iated data relrie ™ d , at s ' e B P ™ 

forwarding decision for the incoming packet At step 720, a 45 P roc ^ sin e, co ^ mues 1 wl . th ste P 780 - At 

_t a . f. j , ■ ,i . , . /I / step 770, the L2 decision algorithm is applied and process- 

determination is made as to whether a matching L3 entry ^continues with step 780. FinaUy, the forwarding decision 

was found during . the search of step 714 If not, then, at step ( ? / 0) 

722, the class action defaults are applied (e.g., forwarding . .„ .5 _ . 

the packet or the packet header to the CPU 161) and 50 b X ™- 7 > P ack =J processing for packets 

processing continues at step 780. If a matching L3 was amvmg on external nnks typically requires two to four 

found, then, at step 724, the associated data corresponding f^T? 1°?^** " m ° rc t ^T? ^ 

to the matching entry is read from the forwarding database SA ° leannn S> route cla f J mateh » " DA 

140 and processing continues at step 780. match )- However, according to an embodiment of the 

.. , _ no , - , . , , . .. present invention, the L2 DA match may be eliminated 

At step 708 Layer 2 learning .s performed. After the ss whenevcl a port u date acccss ta oceded for u leamin 

learning cycle the header class ■ determined and, at step ^ conserving valuable cycles. While the elimination of 

716 the header class is compared against the L3 unicast me u DA match resu] , m floodjn one ^ ^ 

route header class. If there is a match at step 716, processing when a to , ch Q , hc dat6 ^ fa a 

contmu«wiuistep726;othe^e,anoth e rtest K performed relativd farc cvcn , Advantageously) in this mannet> the 

at step 718. At step 718, the header class is compared to the so numbcr of lookups is normany to a 

remaining LJ header classes. maximum of three per packet, without compromising func- 

Specific processing for packets associated with headers tionality. 
classified as L2 includes steps 728 and 738. If the header 

class was determined not to be an L3 header class, then at FORWARDING DATABASE SEARCH 

step 728, a DA search is performed for an L2 forwarding 65 SUPERCYCLE TIMING 

decision. At step 738, the L2 decision algorithm is applied The search supercycle timing will now be described in 

and processing continues at step 780. view of the novel partitioning of forwarding information 
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within the forwarding database 140 and the pipelined for- 
warding database access. 

FIGS. 8A-C are timing diagrams illustrating the three 
worst case content addressable memory search supercycles. 
Advantageously, the partitioning of data among the CAM- 
RAM architecture described with respect to FIG. 4 allows 
forwarding database memory accesses to be pipelined. As 
should be appreciated with reference to FIGS. 8A-C, the 
switch fabric saves valuable cycles by hiding RAM reads 
and writes within CAM accesses. For example, RAM reads 
and writes can be at least partially hidden within the slower 
CAM accesses for each of the supercycles depicted. 

Referring now to FIG. 8A, a search supercyclc including 
an L2 SA search and an L2 DA search is depicted. The first 
CAM short search represents the L2 S A search of the CAMs 
410 and 420 for purposes of L2 learning. As soon as the L2 
SA search has completed, the associated data in the SRAM 
630 may immediately be updated (e.g., RAM read and RAM 
write) while the next CAM short search (L2 DA search) is 
taking place. 

FIG. SB illustrates a case in which L2 and L3 searches are 
combined. The first CAM short search represents an L2 SA 
search. The CAM long search represents a search of the 
forwarding database 140 for a matching L3 entry. Again, 
upon completion of the L2 SA search if learning is required, 
the SRAM read and write may be performed during the 
following CAM access. If a matching L3 entry is found, then 
the RAM burst read of the associated data corresponding to 
the matching entry can be performed during the second 
CAM short search which represents an L2 DA search. 

FIG. 8C illustrates another case in which L2 and L3 
searches are combined. However, in this case, the second 
CAM access is not performed. 

It should be appreciated that the pipelining of the CAM 
and SRAM effectively decouples the speed of the memories. 
Further, the partitioning between the CAM(s) and the 
SRAM should now be appreciated. Because CAM accesses 
are slower than the accesses to the SRAM, it is desirable to 
allocate as much of the forwarding information as possible 
to the SRAM, 

Observing the gaps between the completion of the RAM 
writes and the completion of the second CAM access, it is 
apparent that increasing the speed of the CAM(s) can reduce 
these gaps. The assignee of the present invention anticipates 
future technological developments to allow faster CAMs to 
be developed, thereby creating additional resources for 
additional or faster ports, for example. 

While only the pipelined forwarding database access is 
illustrated in FIGS. 8A-C, it is important to note there are 
many other contributions to the overall speed of the switch 
fabric 210 of the present invention. For example, as 
described above, the highly pipelined switch fabric logic 
includes: pipelined header processing, pipelined forwarding 
database access, and pipelined forwarding database/header 
processing. 

GENERALIZED COMMAND PROCESSING 

Having described an exemplary environment in which 
one embodiment of the present invention may be 
implemented, the general command processing will now be 
described. FIG. 9 is a flow diagram illustrating generalized 
command processing for typical forwarding database 
memory access commands according to one embodiment of 
the present invention. At step 910, the CPU programs 
appropriate data registers in the software command execu- 
tion block 340 using PIOs. For example, certain forwarding 
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database access commands are operable upon a specified 
address that should be supplied by the CPU 161 prior to 
issuing the command. 
At step 920, after the CPU 161 has supplied the appro- 

5 priate parameters for the command, the CPU issues the 
desired command. This may be accomplished by writing a 
command code corresponding to the desired command to a 
command register. 
According to the present embodiment, the CPU 161 polls 

10 a status register until the command issued in step 920 is 
complete (step 930). Alternatively, since the commands have 
a predetermined maximum response time, the CPU 161 need 
not poll the status register, rather the CPU 161 is free to 
perform other functions and may check the status register at 

15 a time when the command is expected to be complete. 
Another alternative is to provide an interrupt mechanism for 
the switch fabric to notify the CPU 161 when the requested 
command is complete. 
At step 940, after the command is complete, the CPU may 

20 act on the results). The results may be provided in memory 
mapped registers in the software command execution block 
340, for example. In this case, the CPU 161 may retrieve the 
results) with a PIO read if necessary. 
At step 950, the issuance of the command by the CPU 161 

25 triggers logic in the software command execution block 340, 
for example, to load the appropriate command parameters. 
These command parameters are assumed to have been 
previously provided by the CPU 161 at step 910. 
At step 960, the software command execution block 340 

30 issues the appropriate forwarding database memory specific 
command(s) to perform the requested task. In this manner, 
the CPU 161 requires no knowledge of the underlying raw 
instruction set for the particular memory or memories used 
to implement the forwarding database 140. 

35 At step 970, upon completion of the forwarding database 
140 access, the software command execution block 340 
updates the results) in appropriate interface registers. Then, 
at step 980, the software command execution block 340 sets 
one or more command status fiag(s) to indicate to the CPU 

40 161 that the command is complete. In other embodiments, 
one or more additional status flags may be provided to 
indicate whether or not the command completed 
successfully, whether or not an error occurred, and/or other 
information that may be useful to the CPU 161. 

45 Having described the general command processing flow, 
an exemplary set of commands and their usage will now be 
described. 

EXEMPLARY COMMAND SET 

50 According to the present embodiment, one or more com- 
mands may be provided for accessing entries in the forward- 
ing database 140. In particular, it may be useful to read a 
newly learned Layer 2 (L2) entry. To retrieve an L2 entry, 
the CPU 161 first programs counters in the switch fabric 210 

55 for addressing the forwarding database memory 140. 
Subsequently, the CPU 161 writes the Read_CAM_Entry 
command to a command register in the switch fabric 210. 
When it is the CPU's turn to be serviced by the switch fabric, 
the switch fabric will read the counters and perform access 

60 the forwarding database memory 140 to retrieve the newly 
learned L2 entry. The switch fabric 210, then writes the L2 
entry to an output register that is accessible by the CPU 161 
and sets the command status done flag. After the command 
is complete, and assuming the command was successful, the 

65 CPU 161 may read the L2 entry from the output register. 
The Read_CAM__Entry command in combination with 
the address counter register are especially useful for burst 
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reads in connection with updating the software's image of 
the entire forwarding database, for example. Because the 
hardware will automatically increment the address counter 
register at the completion of each memory access. The 
software only needs to program the address register prior to 
the first memory access. In this manner, the software may 
read the entire forwarding database 140 very efficiently. 
Similarly, it will be apparent that other forwarding memory 
accesses are also simplified such as sequences of writes 
during L3 entry initialization. The mechanism for writing 
entries to the forwarding database memory 140 will now be 
described. 

It is also convenient for the CPU 161 to be able to write 
an entry to the forwarding database memory. In particular, it 
may be useful to initialize all L3 entries in the forwarding 
database with a predeterrnined filler (or dummy) value. This 
command may also be useful for invalidation of L3 entries 
or before performing a mask update in a mask per bit (MPB) 
content associative memory (CAM), for example. A Write_ 
CAM _Entry command is provided for this purpose. Again, 
the CPU 161 should first program the appropriate counters 
in the switch fabric 210. The CPU 161 also provides the L3 
key to be written to the forwarding database memory 140. 
After these steps, the CPU 161 may issue the Write_CAM_ 
Entry command using a PIO write to the command register. 
The CPU 161 may then begin polling the command status. 
The switch fabric 210 reads the parameters provided by the 
CPU 161 and initializes the corresponding L3 entry to a 
predetermined filler (or dummy). After the write is complete, 
the switch fabric 210 notifies the CPU 161 of the status of 
the command by setting the command status done flag. 

Commands may also be provided for accessing associated 
data. According to one embodiment of the present invention 
the following operations are provided: (1) learning a sup- 
plied address; (2) reading associated data corresponding to 
a supplied search key; (3) aging forwarding database entries; 
(4) invalidating entries; (5) accessing mask data, such as 
mask data that may be stored in a MPB CAM, corresponding 
to a particular search key; and (6) replacing forwarding 
database entries. 

L2 source address learning may be performed by a 
Learn_L2_SA command. First, the CPU 161 programs the 
appropriate registers in the switch fabric 210 with an L2 
search key and a new entry to insert or a modified entry. 
Then, CPU 161 issues the Learn_L2_SA command and 
begins polling the command status. The switch fabric 210 
reads the data provided by the CPU 161. If an entry is not 
found in the forwarding database 140 that matches the 
supplied address, then the new entry will be inserted into the 
forwarding database. After the insertion is complete or upon 
verifying a matching entry already exists, the switch fabric 
210 notifies the CPU 161 of the status of the command by 
setting the command status done flag. 

It is also convenient for the CPU 161 to be able to perform 
aging. In particular, it is useful to age L2 and L3 forwarding 
database entries. Age_SA and Age__NDA commands are 
provided for this purpose. The CPU 161 writes the appro- 
priate key and the modified age field to the switch fabric 
interface. Then, CPU 161 issues either the Age_SA com- 
mand or the Age„DA command. The Age_SA command 
sets the source address age field in the L2 entry correspond- 
ing to the provided search key. The Age_D A command sets 
the destination address age field for the L2 or L3 entry 
corresponding to the provided search key. After issuing the 
command, the CPU 161 may begin polling the command 
status. The switch fabric 210 reads the data provided by the 
CPU 161 and updates the appropriate age field in the 
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matching entry. After aging is complete, the switch fabric 
210 notifies the CPU 161 of the status of the command by 
setting the command status done flag. 
The CPU 161 may also need to have the ability to 

5 invalidate forwarding database entries such as aged 12 
entries, for example. The Invalidate_L2_Entry command is 
provided for this purpose. Prior to issuing the Invalidate_ 
L2_Entry command, the CPU 161 programs the appropriate 
address counters in the switch fabric 210. After issuing the 

10 command, the CPU 161 may begin polling the command 
status. The switch fabric 210 reads the data provided by the 
CPU 161 and resets the validity bit at the address counter 
location specified. After the entry invalidation is complete, 
the switch fabric 210 notifies the CPU 161 of the status of 

15 the command by setting the command status done flag. 
In embodiments employing MPB CAMs, typically the 
CAM stores alternating sets of data and masks. Each set of 
data has a corresponding mask. The masks allow program- 
mable selection of portions of data from the corresponding 
CAM line. Thus, it is convenient for the CPU 161 to be able 
to access the mask data corresponding to a particular address 
in the CAM. In particular, it is useful to update the mask data 
to select different portions of particular CAM lines. The 
Update_Mask command is provided for this purpose. The 

25 CPU 161 programs the address counter register and pro- 
grams the new mask into the appropriate register. Then, CPU 
161 issues the Update_Mask command and may begin 
polling the command status. The switch fabric 210 reads the 
parameters provided by the CPU 161 and updates the mask 

30 data corresponding to the specified address. After the mask 
data update is complete, the switch fabric 210 notifies the 
CPU 161 of the status of the command by setting the 
command status done flag. The CPU 161 may also read 
mask data in a similar fashion by employing a Read_Mask 

35 command and providing the appropriate address. 

Finally, it is desirable to be able to replace entries. 
Particularly, it is useful to replace filler (or dummy) L3 
entries with new valid L3 entries. The Replace__L3 com- 
mand is provided for this purpose. The CPU 161 provides an 

40 L3 search key to the switch fabric 210 and provides the new 
valid L3 entry. Then, the CPU 161 issues the Replace__L3 
command and may begin polling the command status. The 
switch fabric 210 reads the parameters provided by the CPU 
161 and performs a search of the forwarding database 140 

45 for the matching L3 entry. After locating the matching L3 
entry, the associated data corresponding to the matching 
entry is replaced with the new valid L3 entry provided by the 
CPU 161. After the L3 entry has been replaced, the switch 
fabric 210 notifies the CPU 161 of the status of the command 

50 by setting the command status done flag. 

Importantly, while embodiments of the present invention 
have been described with respect to specific commands and 
detailed steps for executing particular commands, those of 
ordinary skill in the art will appreciate that the present 

55 invention is not limited to any particular set of commands or 
sequence of execution. 

In the foregoing specification, the invention has been 
described with reference to specific embodiments thereof. It 
will, however, be evident that various modifications and 

60 changes may be made thereto without departing from the 
broader spirit and scope of the invention. For example, 
embodiments of the present invention have been described 
with reference to specific network protocols such as IP 
However, the method and apparatus described herein are 

65 equally applicable to other types of network protocols. The 
specification and drawings are, accordingly, to be regarded 
in an illustrative rather than a restrictive sense. 
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What is claimed is: 

1. A switch fabric comprising: 

a search engine for coupling to a forwarding database 
memory and a plurality of input ports, the search engine 
configured to schedule and perform accesses to the 5 
forwarding database memory and to transfer forward- 
ing decisions to the plurality of input ports; and 

a header processing unit coupled to the search engine and 
having an arbitrated interface for coupling to the plu- 
rality of input ports, the header processing unit config- 10 
ured to receive a packet header from an input port of the 
plurality of input ports and to construct a first search 
key for accessing the forwarding database memory 
based upon a predetermined portion of the packet 
header, the predetermined portion of the packet header 
being selected based upon a class of a plurality of 
classes with which the packet header is associated. 

2. The switch fabric of claim 1, wherein the header 
processing unit further comprises the following pipeline 
stages: 

e 20 
an address accumulation unit, coupled to the plurality of 

input ports and the arbitrated interface, for accessing 

address information from the packet header; 

an encapsulation processing unit, coupled to the plurality 
of input ports and the arbitrated interface, for selecting ^ 
a predetermined set of fields from the packet header to 
determine a type of encapsulation; 

a header class matching unit coupled to the plurality of 
input ports and the arbitrated interface, the header class 
matching including comparison logic to determine a 30 
header class based upon the type of encapsulation and 
a predetermined set of fields. 

3. The switch fabric of claim 1, where in the first search 
key is a Layer 3 (L3) search key. 

4. The switch fabric of claim 3, where the header pro- 35 
cessing unit is further configured to construct a Layer 2 (L2) 
search key for accessing the forwarding database memory. 

5. The switch fabric of claim 1, where in the first search 
key is a Layer 2 (L2) search key. 

6. The switch fabric of claim 1, wherein the plurality of 40 
input ports may each request for a forwarding decision 
independently of the others. 

7. The switch fabric of claim 1, wherein the forwarding 
database memory comprises one or more content address- 
able memories (CAMs) coupled to a random access memory 45 
(RAM). 

8. The switch fabric of claim 1, further including a 
command execution unit configured to interface with a 
processor, the command execution unit further configured to 
access the forwarding database memory on behalf of the 50 
processor. 

9. The network device of claim 1, wherein the forwarding 
database memory comprises a first memory and a second 
memory and wherein the search engine is configured to 
pipeline accesses to the first memory and the second 55 
memory, 

10. The switch fabric of claim 3, wherein the L2 search 
key is of a first size and the L3 search key is of a second size, 
the second size being greater than the first size. 

11. A switch fabric comprising: ^ 
a search engine for coupling to a forwarding database 

memory and a plurality of input ports, the search engine 
configured to schedule and perform accesses to the 
forwarding database memory and to transfer forward- 
ing decisions to the plurality of input ports; and 65 
a header processing unit coupled to the search engine and 
having an arbitrated interface for coupling to the phi- 
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rality of input ports, the header processing unit config- 
ured to receive a packet header from an input port of the 
plurality of input ports and to construct a first key and 
a second key for accessing a forwarding database 
memory, the first key comprising one or more fields 
from a first portion of the packet header and having a 
first length, the second key comprising one or more 
fields from a second portion of the packet header and 
having a second length. 

12. The switch fabric of claim 11, wherein the first key is 
a Layer 2 (L2) key for retrieving an L2 entry from the 
forwarding database memory, the second key is a Layer 3 
(L3) key for retrieving an L3 entry from the forwarding 
database memory. 

13. The switch fabric of claim 12, wherein the first length 
is smaller than the second length. 

14. The switch fabric of claim 12, wherein the first portion 
of the packet header comprises a media access control 
(MAC) header, and wherein the second portion of the packet 
header comprises an Internet Protocol (IP) header. 

15. The switch fabric of claim 11, wherein the forwarding 
database memory comprises one or more content address- 
able memories (CAMs) coupled to a random access memory 
(RAM). 

16. The switch fabric of claim 15, wherein an address for 
accessing the RAM includes an index produced by the one 
or more CAMs, and wherein the RAM contains both L2 and 
L3 forwarding information. 

17. The switch fabric of claim 11, wherein the packet 
header includes an L2 header and an L3 header, and wherein 
the header processing unit comprises pipelined logic to 
allow processing of more than one packet header 
simultaneously, the pipeline logic including: 

an encapsulation block configured to determine the type 
of header encapsulation that has been employed in a 
first packet header and to determine an indication of the 
start of the L3 header based upon the type of header 
encapsulation; and 

an L3 header class matching block coupled to the encap- 
sulation block for receiving the indication of the start of 
the L3 header, the L3 header class matching block 
configured to determine a class of a plurality of L3 
classes with which a second packet header is associated 
based upon one or more fields in the L2 header and the 
L3 header. 

18. A network device comprising: 

a plurality of ports including a first port for receiving a 
packet from a network; 

a forwarding memory including a first memory and a 
second memory, the first memory having stored therein 
an associative data entry corresponding to an the 
packet, the second memory coupled to the first memory 
and having stored therein an associated data entry 
corresponding to the associative data entry, the asso- 
ciative data entry including an indication of a set of 
ports to which the packet should be forwarded; and 

a search engine coupled to the plurality of ports and the 
forwarding memory, the search engine configured to 
schedule and perform accesses to the forwarding 
memory and to transfer the indication to the first port. 
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19. The network device of claim 18, wherein the search 
engine is configured to eliminate Layer 2 (L2) destination 
address (DA) matching whenever an 12 learning cycle is 
needed. 

20. The network device of claim 19, wherein the first 
memory comprises one or more content addressable memo- 
ries (CAMs), and wherein the second memory comprises a 
random access memory (RAM). 

21. The network device of claim 18, wherein the first 
memory and the second memory may be accessed in par- 
allel. 
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22. The network device of claim 21, wherein the search 
engine is configured to pipeline accesses to the first memory 
and the second memory. 

23. The network device of claim 22, wherein the first 
memory comprises one or more content addressable memo- 
ries (CAMs), and wherein the second memory comprises a 
random access memory (RAM). 

***** 
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